This report provides findings for scouting players that are high performing and not well paid for the team to recruit.
Provide a well commented and clean (knitted) report of your findings that can be presented to your GM. Include a rationale for variable selection, details on your approach and a overview of the results with supporting visualizations.
The first step is to see which variables are most correlated with salary:
The two most correlated are points (PTS) and assists (AST) with correlations of 0.59 and 0.58, respectively. These will be used for the two centers for the clustering algorithm. Players with both high points and high assists are considered high performing.
Using a clustering algorithm with two clusters (or groups), we obtain the following results:
We can use these results from the clustering algorithm to view salary as well, and identify which players may be high performing and underpaid.
The use of two clusters accounts for 59.2% of the variance in the data. This is ok, but we can improve this with more clusters.
2 clusters was ok, but we want more clusters so we can see which players are really high performing. We can use the elbow method to see how much variance can be explained by using 2-10 clusters.
Another method to use to identify the optimal amount of clusters is using the NbClust method. This method identifies the optimal number of clusters under different criteria. Below is a graph of the number of votes for each number of clusters.
We see that the elbow method and the NbClust method yield pretty different results in the optimal number of clusters. We will run the algorithm using 4 clusters to compromise between each method’s results without overfitting.
Results after using 4 clusters:
You can hover over the plot to identify the points, assists, cluster, and player name. Note that this data is normalized; we will look at the true values in the following section.
We want players with a high number of points and assists, but low current salaries. On the chart above, this is data points that are in cluster 4 (x’s) but are lighter blue in color. You can see three points that meet these criteria on the top right of the plot. These players are:
Data for these players:
## # A tibble: 3 x 7
## Player Pos Age Tm PTS AST `2020-21`
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 DeAaronFox PG 23 SAC 806 265 8099627
## 2 LukaDoni PG 21 DAL 916 287 8049360
## 3 TraeYoung PG 22 ATL 897 321 6571800
All three of these players are point guards - let’s find some other players we would want to hire that play other positions.
## # A tibble: 2 x 7
## Player Pos Age Tm PTS AST `2020-21`
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 DonovanMitchell SG 24 UTA 839 183 5195501
## 2 ShaiGilgeousAlexander SG 22 OKC 697 187 4141320
Both of these players are shooting guards, and are not currently being paid well yet still have high performance. These would be good additions to the team.
Players that the team definitely does not want are those who are already very highly paid, or are not good, or both. Three that we definitely do not want on the team are:
Each of these players are being paid a lot and have less points and assists than those identified in the previous section. Here are their stats:
## # A tibble: 3 x 7
## Player Pos Age Tm PTS AST `2020-21`
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 JimmyButler SF 31 MIA 452 172 34379100
## 2 JohnWall PG 30 HOU 526 151 41254920
## 3 MikeConley PG 33 UTA 466 164 34504132
Unsure about these players:
## # A tibble: 3 x 7
## Player Pos Age Tm PTS AST `2020-21`
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 GarrettTemple SG 34 CHI 290 68 4767000
## 2 TJMcConnell PG 28 IND 215 216 3500000
## 3 WillBarton SF 30 DEN 383 102 13920000